{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Useful stuff\n",
    "\n",
    "This notebook shows you some handy ways to get going with more quantitative things in Python. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 1: \n",
    "\n",
    "# Physically download the files to your computer\n",
    "# Unhashtag the line below to install matplotlib)\n",
    "# You only have do do this once\n",
    "\n",
    "# !pip install matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Step 2: \n",
    "\n",
    "# Link the newly downloaded files to your current jupyter notebook session\n",
    "# You must do this each time you close and reopen your jupyter notebook session\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Package methods\n",
    "\n",
    "Like strings, lists, and dictionaries, packages have their own methods as well. Since we imported the pyplot module from the matplotlib library as the alias \"plt\", we can use our do notation. Place your cursor after the period in the cell below and press the TAB key so see plotting methods: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scatterplot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = [1, 2, 3, 4, 5]\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = [5, 6, 3, 2, 1]\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(x = x, y = y, c = \"blue\", s = 500)\n",
    "plt.axis([0, 8, 0, 12])\n",
    "plt.title(\"Scattplot example\")\n",
    "plt.xlabel(\"Height\")\n",
    "plt.ylabel(\"Width\");\n",
    "# plt.savefig(\"scatter.tiff\") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This figure will save in your working directory. Unhashtag the line below to see the location:\n",
    "\n",
    "# pwd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Histogram "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "values = [3,4,4,5,6,7,8,5,4,3,2,4,5,6,7,8,9,6,5,4,3,2,2,4,5,6,7,8,5,4,3,3,2]\n",
    "print(values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.hist(x = values, bins = 10, color = \"orange\", orientation = \"horizontal\");\n",
    "# plt.savefig(\"hist.tiff\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas\n",
    "\n",
    "Pandas is the standard way to work with tabular data in Python - think a two-dimensional spreadsheet organized into rows and columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Pandas also has some neat methods!\n",
    "pd."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make an example data frame\n",
    "data_frame = pd.DataFrame({\"A\":[1,4,5,7,9],\n",
    "                           \"B\":[4,7,2,5,2], \n",
    "                           \"Height\":[44,55,77,33,22]}) \n",
    "data_frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import .csv files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mock = pd.read_csv(\"./MOCK_DATA.csv\")\n",
    "mock.head(n = 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(mock.height, mock.width);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Seaborn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install seaborn\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.lmplot(x = \"height\", y = \"width\", data = mock);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Boxplots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.boxplot(y = \"height\", \n",
    "            data = mock, palette = \"Set3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import .txt files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fiji2014.txt\n",
    "text = open(\"../Day_4/data/txts/fiji2014.txt\", encoding=\"utf-8\").read()\n",
    "print(text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(text.count(\"human\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# t-test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install scipy\n",
    "import scipy\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = [2, 4, 6, 9, 10]\n",
    "y = [3, 6, 22, 11, 19]\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scipy.stats.ttest_ind(x, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Or, using a data frame\n",
    "scipy.stats.ttest_ind(mock.height, mock.width)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Linear regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "scipy.stats.linregress(x = data_frame.A, y = data_frame.B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
